[DRAFT] Subgroup 2D Block IO lowering example for AxB GEMM #4608

alexbaden · 2025-07-02T20:56:15Z

This PR demonstrates a possible lowering path for Load Ops with Subgroup 2D Block IO layout.

The 2D block IO instruction is dispatched in a single loop over registers in a work-item, which is consistent with other Triton SIMT backends.
Layout conversion is handled automatically when the ConvertLayoutOp is lowered to LLVM.
The subgroup 2D Block IO load lowering does not have a dependence on the DPAS layout or the Dot operand.
Performance is slightly worse - 25-30%. This is likely attributable to either register ordering or bitcasts.
Transpose is not yet supported. Some other parameters are hard coded. I expect some test failures.

Note that this depends on #4463, #4500, and #4549 - those should probably be merged first, especially since they pass tests and have been validated against the benchmarks. Merging those other PRs will dramatically reduce the scope of this PR.

add missing definition WIP: use new encoding in load store op to llvm

this is being handled by loadstoreoptollvm currently

…roup 2d block layout )

"post-conversion")

(except for oneMatrixPerBT)

…ing 1/?

alexbaden added 20 commits June 17, 2025 18:21

Add pass to convert block load to subgroup 2d block encoding types

09fec1e

Mark subgroup 2d block loads as expenisve loads

f7e81ce

WIP: use the subgroup 2d block layout in LoadStoreOpToLLVM

e301b43

add missing definition WIP: use new encoding in load store op to llvm

remove convert layout op which converts subgroup2d block to dpas

3e149ae

this is being handled by loadstoreoptollvm currently

remove debug code

22b79fb

do not add barrier op for subgroup 2d block -> dpas conversion

b84cbf9

fixup handling of tensor ptrs when lowering to gather load (with subg…

f83bed8

…roup 2d block layout )

use dpas tensor type for packed load type (essentially

2b354d2

"post-conversion")

fixup op type override

f06f90b

do not apply subgroup 2d block encoding to A transpose

7e5b52b

Store transpose attribute in Subgroup2DBlockIO layouts

234897f

Compute final load shape in Subgroup2DBlockIO layout

465b99c

(except for oneMatrixPerBT)

remove debug code

e85d577

Remove legacy tile layout code

9985b76

Separate subgroup 2d block encoding lowering from dpas block io lower…

6652cac

…ing 1/?

checkpoint: a matrix working

1378934

b working

6a3ec80

remove the convert operator delete

b129f14

add a basic unit test

e436f30

remove debug code and format

41198c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DRAFT] Subgroup 2D Block IO lowering example for AxB GEMM #4608

[DRAFT] Subgroup 2D Block IO lowering example for AxB GEMM #4608

Uh oh!

alexbaden commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[DRAFT] Subgroup 2D Block IO lowering example for AxB GEMM #4608

Are you sure you want to change the base?

[DRAFT] Subgroup 2D Block IO lowering example for AxB GEMM #4608

Uh oh!

Conversation

alexbaden commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants